We propose GazeNeRF, a 3D-aware method for the task of gaze redirection. Existing gaze redirection methods operate on 2D images and struggle to generate 3D consistent results. Instead, we build on the intuition that the face region and eyeballs are separate 3D structures that move in a coordinated yet independent fashion. Our method leverages recent advancements in conditional image-based neural radiance fields and proposes a two-stream architecture that predicts volumetric features for the face and eye regions separately. Rigidly transforming the eye features via a 3D rotation matrix provides fine-grained control over the desired gaze angle. The final, redirected image is then attained via differentiable volume compositing. Our experiments show that this architecture outperforms naively conditioned NeRF baselines as well as previous state-of-the-art 2D gaze redirection methods in terms of redirection accuracy and identity preservation.
translated by 谷歌翻译
在鸟眼中学习强大的表现(BEV),以进行感知任务,这是趋势和吸引行业和学术界的广泛关注。大多数自动驾驶算法的常规方法在正面或透视视图中执行检测,细分,跟踪等。随着传感器配置变得越来越复杂,从不同的传感器中集成了多源信息,并在统一视图中代表功能至关重要。 BEV感知继承了几个优势,因为代表BEV中的周围场景是直观和融合友好的。对于BEV中的代表对象,对于随后的模块,如计划和/或控制是最可取的。 BEV感知的核心问题在于(a)如何通过从透视视图到BEV来通过视图转换来重建丢失的3D信息; (b)如何在BEV网格中获取地面真理注释; (c)如何制定管道以合并来自不同来源和视图的特征; (d)如何适应和概括算法作为传感器配置在不同情况下各不相同。在这项调查中,我们回顾了有关BEV感知的最新工作,并对不同解决方案进行了深入的分析。此外,还描述了该行业的BEV方法的几种系统设计。此外,我们推出了一套完整的实用指南,以提高BEV感知任务的性能,包括相机,激光雷达和融合输入。最后,我们指出了该领域的未来研究指示。我们希望该报告能阐明社区,并鼓励对BEV感知的更多研究。我们保留一个活跃的存储库来收集最新的工作,并在https://github.com/openperceptionx/bevperception-survey-recipe上提供一包技巧的工具箱。
translated by 谷歌翻译
联合学习(FL)使移动设备能够在保留本地数据的同时协作学习共享的预测模型。但是,实际上在移动设备上部署FL存在两个主要的研究挑战:(i)频繁的无线梯度更新v.s.频谱资源有限,以及(ii)培训期间渴望的FL通信和本地计算V.S.电池约束的移动设备。为了应对这些挑战,在本文中,我们提出了一种新型的多位空天空计算(MAIRCOMP)方法,用于FL中本地模型更新的频谱有效聚合,并进一步介绍用于移动的能源有效的FL设计设备。具体而言,高精度数字调制方案是在MAIRCOMP中设计和合并的,允许移动设备同时在多访问通道中同时在所选位置上传模型更新。此外,我们理论上分析了FL算法的收敛性。在FL收敛分析的指导下,我们制定了联合传输概率和局部计算控制优化,旨在最大程度地减少FL移动设备的总体能源消耗(即迭代局部计算 +多轮通信)。广泛的仿真结果表明,我们提出的方案在频谱利用率,能源效率和学习准确性方面优于现有计划。
translated by 谷歌翻译
最近已经提出了3D车道检测的方法,以解决许多自动驾驶场景(上坡/下坡,颠簸等)中不准确的车道布局问题。先前的工作在复杂的情况下苦苦挣扎,因为它们对前视图和鸟类视图(BEV)之间的空间转换以及缺乏现实数据集的简单设计。在这些问题上,我们介绍了Persformer:具有新型基于变压器的空间特征变换模块的端到端单眼3D车道检测器。我们的模型通过参考摄像头参数来参与相关的前视本地区域来生成BEV功能。 Persformer采用统一的2D/3D锚设计和辅助任务,以同时检测2D/3D车道,从而提高功能一致性并分享多任务学习的好处。此外,我们发布了第一个大型现实世界3D车道数据集之一:OpenLane,具有高质量的注释和场景多样性。 OpenLane包含200,000帧,超过880,000个实例级别的车道,14个车道类别,以及场景标签和封闭式对象注释,以鼓励开发车道检测和更多与工业相关的自动驾驶方法。我们表明,在新的OpenLane数据集和Apollo 3D Lane合成数据集中,Persformer在3D车道检测任务中的表现明显优于竞争基线,并且在OpenLane上的2D任务中也与最新的算法相当。该项目页面可在https://github.com/openperceptionx/persformer_3dlane上找到,OpenLane数据集可在https://github.com/openperceptionx/openlane上提供。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.
translated by 谷歌翻译
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy. It is critical in a number of sequential decision making problems ranging from healthcare to technology industries. Most of the work in existing literature is focused on evaluating the mean outcome of a given policy, and ignores the variability of the outcome. However, in a variety of applications, criteria other than the mean may be more sensible. For example, when the reward distribution is skewed and asymmetric, quantile-based metrics are often preferred for their robustness. In this paper, we propose a doubly-robust inference procedure for quantile OPE in sequential decision making and study its asymptotic properties. In particular, we propose utilizing state-of-the-art deep conditional generative learning methods to handle parameter-dependent nuisance function estimation. We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform. In particular, we find that our proposed estimator outperforms classical OPE estimators for the mean in settings with heavy-tailed reward distributions.
translated by 谷歌翻译